Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 2 months ago

Wonder how the survey was sent out and whether that affected sampling.

Regardless, with -3-4k responses, that’s disappointing, if not concerning.

I only have a more personal sense for Lemmy. Do you have a source for Lemmy gender diversity?

Anyway, what do you think are the underlying issues? And what would be some suggestions to the community to address them?

inspxtr@lemmy.world · 6 months ago

sounds like this can be a plot of a new Pixar movie

inspxtr@lemmy.world · 7 months ago

Lol ads that can be engineered into DNA, so that they can be passed down for generations.

inspxtr@lemmy.world · 8 months ago

hm, I think I’ve been doing it wrong then …

inspxtr@lemmy.world · 8 months ago

someone should make an alternate history tv show where the ship made it. bonus if it’s of a parody kind.

inspxtr@lemmy.world · 8 months ago

doesn’t seem to be so comfortable with glasses, esp with a hoodie unfortunately

inspxtr@lemmy.world · 11 months ago

“Bad” can be quite broad and it might be cumbersome to check and categorize all of the “badness” out there. You might have better luck narrowing down a bit. For example, if you’re interested in AI/algorithm incidences, there are at least two that came up on search:

inspxtr@lemmy.world · 1 year ago

On a tangential note of another comment about AI training and such, this is a touchy and evolving subject, but it might be good to include how you want your content to be used and not be used, and by whom, especially if you intend to make them public.

inspxtr@lemmy.world · edit-2 1 year ago

some wiki backends allow password protection. for example, mkdocs, which also renders markdown, has mkdocs-encryptcontent-plugin to allow global or even page-specific passwords for private repos.

but these encrypted pages would of course have the risk of not being archived by the wayback machine.

inspxtr@lemmy.world · 1 year ago

How is the entity or power that has the ability to grant me such knowledge connected to the existence of the universe?

inspxtr@lemmy.world · 1 year ago

can people not use that to take each other’s shops down?

inspxtr@lemmy.world · 1 year ago

forgive my naivety, how does such a community avoid promoting ageism?

inspxtr@lemmy.world · 1 year ago

IANAL and I have never had this experience. But intuitively, maybe send them an email/complaint asking about it, eg via their tech support or privacy offices, and referencing the relevant laws. If you need to fill this out quickly, try calling them instead. Give it a few weeks (if via email), then maybe contact relevant offices. If you live in certain states in CA, you may try to get this elevated.

inspxtr@lemmy.world · 1 year ago

the whole premise of OP is that this monitors people, and many organizations use TOTP, which one could also use without internet connections or phones AFAIK.

I’m in academia and I wish this is implemented more. Data breaches are getting quite common, and Github is so entwined in software engineering that it is critical to increase security measures.

inspxtr@lemmy.world · 1 year ago

or maybe most of them in a folder? and one file that defines their locations for environment variables

inspxtr@lemmy.world · 1 year ago

what are the other alternatives to ENV that are more preferred in terms of security?

inspxtr@lemmy.world · edit-2 1 year ago

yeah I guess maybe the formatting and the verbosity seems a bit annoying? Wonder what the alternatives solution could be to better engage people from mastodon, which is what this bot is trying to address.

edit: just to be clear, I’m not affiliated with the bot or its creator. This is just my observation from multiple posts I see this bot comments on.

inspxtr@lemmy.world · 1 year ago

I’m curious, why is this bot currently being downvoted for almost every comment it makes?

inspxtr@lemmy.world · edit-2 1 year ago

Thanks for the suggestions! I’m actually also looking into llamaindex for more conceptual comparison, though didn’t get to building an app yet.

Any general suggestions for locally hosted LLM with llamaindex by the way? I’m also running into some issues with hallucination. I’m using Ollama with llama2-13b and bge-large-en-v1.5 embedding model.

Anyway, aside from conceptual comparison, I’m also looking for more literal comparison, AFAIK, the choice of embedding model will affect how the similarity will be defined. Most of the current LLM embedding models are usually abstract and the similarity will be conceptual, like “I have 3 large dogs” and “There are three canine that I own” will probably be very similar. Do you know which choice of embedding model I should choose to have it more literal comparison?

That aside, like you indicated, there are some issues. One of it involves length. I hope to find something that can build up to find similar paragraphs iteratively from similar sentences. I can take a stab at coding it up but was just wondering if there are some similar frameworks out there already that I can model after.

inspxtr@lemmy.world · edit-2 1 year ago

Selfhosted “plagiarism” checker with custom sources?

inspxtr@lemmy.world · 1 year ago

I wish for a new genie that grants wishes successfully but never tries or succeeds in cursing my wishes.

inspxtr@lemmy.world · 1 year ago

Looking for data about company ownership network

inspxtr@lemmy.world · 1 year ago

Suggestion for Airtable alternative with mobile options?

inspxtr@lemmy.world · 1 year ago

Does discuss/discourse have federation?

inspxtr@lemmy.world · 1 year ago

Is there a day of any given year that is least special?

inspxtr@lemmy.world · 1 year ago

Comment systems for static pages (Jekyll)?